global translation
- Oceania > Australia (0.04)
- Europe > Switzerland (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
BaroPoser: Real-time Human Motion Tracking from IMUs and Barometers in Everyday Devices
Zhang, Libo, Yi, Xinyu, Xu, Feng
In recent years, tracking human motion using IMUs from everyday devices such as smartphones and smartwatches has gained increasing popularity. However, due to the sparsity of sensor measurements and the lack of datasets capturing human motion over uneven terrain, existing methods often struggle with pose estimation accuracy and are typically limited to recovering movements on flat terrain only. To this end, we present BaroPoser, the first method that combines IMU and barometric data recorded by a smartphone and a smartwatch to estimate human pose and global translation in real time. By leveraging barometric readings, we estimate sensor height changes, which provide valuable cues for both improving the accuracy of human pose estimation and predicting global translation on non-flat terrain. Furthermore, we propose a local thigh coordinate frame to disentangle local and global motion input for better pose representation learning. We evaluate our method on both public benchmark datasets and real-world recordings. Quantitative and qualitative results demonstrate that our approach outperforms the state-of-the-art (SOTA) methods that use IMUs only with the same hardware configuration.
- Asia > South Korea > Busan > Busan (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Beijing > Beijing (0.04)
HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning
Valassakis, Eugene, Garcia-Hernando, Guillermo
Predicting camera-space hand meshes from single RGB images is crucial for enabling realistic hand interactions in 3D virtual and augmented worlds. Previous work typically divided the task into two stages: given a cropped image of the hand, predict meshes in relative coordinates, followed by lifting these predictions into camera space in a separate and independent stage, often resulting in the loss of valuable contextual and scale information. To prevent the loss of these cues, we propose unifying these two stages into an end-to-end solution that addresses the 2D-3D correspondence problem. This solution enables back-propagation from camera space outputs to the rest of the network through a new differentiable global positioning module. We also introduce an image rectification step that harmonizes both the training dataset and the input image as if they were acquired with the same camera, helping to alleviate the inherent scale-depth ambiguity of the problem. We validate the effectiveness of our framework in evaluations against several baselines and state-of-the-art approaches across three public benchmarks.
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
MoST: Motion Style Transformer between Diverse Action Contents
Kim, Boeun, Kim, Jungho, Chang, Hyung Jin, Choi, Jin Young
While existing motion style transfer methods are effective between two motions with identical content, their performance significantly diminishes when transferring style between motions with different contents. This challenge lies in the lack of clear separation between content and style of a motion. To tackle this challenge, we propose a novel motion style transformer that effectively disentangles style from content and generates a plausible motion with transferred style from a source motion. Our distinctive approach to achieving the goal of disentanglement is twofold: (1) a new architecture for motion style transformer with `part-attentive style modulator across body parts' and `Siamese encoders that encode style and content features separately'; (2) style disentanglement loss. Our method outperforms existing methods and demonstrates exceptionally high quality, particularly in motion pairs with different contents, without the need for heuristic post-processing. Codes are available at https://github.com/Boeun-Kim/MoST.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)